image.png

Genshin Impact: What Factors Might Affect Sales For a Banner?¶

First: What is Genshin Impact?¶

Genshin Impact is a free-to-play game that entails the genres of action, Role-Playing (RPG) and Open-World. It is also a 'gacha' game: where in-game currency is used to test one's luck in obtaining a special rated character that helps improve their account. The game was created by MiHoYo (now Hoyoverse) from Shanghai, China.

The game is currently available on iOS, Android for mobile, several compatible PC units, and the PlayStation 4 and 5 with a release due on the Nintendo Switch sometime in 2022.

As stated on Wikipedia: "Prior to its release the game had over 10 million registrations, with over half of that from outside China. According to some, the game was the biggest international release of any Chinese video game. In the lead up to release, the game won the Tokyo Game Show Media Awards 2020 public poll, ranking first among 14 other games."

Genshin Impact grew rapidly since its launch, taking the game industry by storm one may say, and making a name for itself around the world.

The game generates revenue through its limited event wish durations (character banners as we call them) and has recently started to allow the option of "buying skins" or alternate outfits for playable characters. It also has a Battle Pass system in place where players can complete challenges or objectives over the course of 6 weeks for rewards.

In this project, we will attempt to look at and analyze trends between various factors that go into play when looking at the success of a promotional character.

So, why care about a random game on the App Store?¶

image.png

As seen in the graphic above, Genshin Impact is estimated to have the highest first year gaming revenue in recent times, slightly edging past Fortnite in the Earnings Estimate. To the average person it may look like "an anime game" yet it also surpasses "mainstream" gaming such as GTA V and Call of Duty: Modern Warfare when it comes to first-year sales. This is indicative of how strong the launch of this game was.

As an outsider to this game, it will be an interesting experience to gain insight on how the gacha system works on Genshin Impact and some potential factors behind the success of a promotional character while also getting to see some clips that we have recorded of gameplay to maybe loop you in!

If you are a player, you may get to learn how some others make their in-game decisions and their rationale behind it.

The Genshin Summon System¶

image.png Genshin Impact uses a summon system as previously mentioned to obtain promotional, limited time characters. All characters in Genshin Impact have a rating of either "4-star" or "5-star". Promotional characters belong to the 5-star or 5* category. Barring the five characters Jean, Qiqi, Mona, Keqing and Diluc plus the Main Protagonist (Aether if Male or Lumine if Female), every character is limited and are available for a 3-week period only. Weapons also follow a similar categorization of 4* and 5* but they have a slightly different summon system despite using the same summon currency as the promotional 5* character. If someone says "I'm rolling this banner" it means they are using their summon currency for the promotional character or weapon of this patch.

The summons are typically called "Wishes" or "Rolls" (in-game currency name would be Intertwined Fate or Acquaint Fate) and the number of rolls it takes to get the promotional character is called "Pity". For example, if someone says "I'm at 67 pity", they mean that they've used 67 wishes since their last 5* summon.

Click here for more detailed information on the Genshin Summon System.

Get a feel for how gameplay looks here: https://www.youtube.com/embed/XCzgt03R5sE

Data Collection¶

Paimon.moe¶

image.png Many Genshin players use this website called paimon.moe in order to keep track of their Genshin Wish History. Paimon.moe is generally among the most trusted third party sources to store Genshin Wish History since Hoyoverse deletes any records that are 6 months or older. Thus, as it is known for being very organized and informative, many users use it!

In addition, there are no public Genshin data for wishes, so it is necessary for users to make their own if they want to see the data. For this tutorial, we'll be using data obtained from here, give you the opportunity to send data our way to process, and use it for various analyses.

The first step is to collect roll data through paimon.moe as an Excel Spreadsheet. You can easily follow the instructions to gather the data and compile it into an Excel Sheet by clicking the link in the above paragraph.

Afterwards, we want to insert all the Excel Sheets into the workspace. To start off, we will need to import the libraries needed in order to complete the investigation and analysis on a Jupyter notebook. We then use the pandas library in order to read each excel file to create a collective wish datachart. The pandas library is a very useful software library known for data manipulation and analysis.

In addition, We removed any dataset that occurred before 11/2/2021 to limit a certain amount of banners/characters on our datachart. This is because the data for the banners (3 week period) before is somewhat incomplete. Hoyoverse deletes any records that are 6 months or older as previously stated. We are accounting for this to streamline our dataset.

In [38]:
# all our imports
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn import datasets, svm
import statsmodels.formula.api as sms
from statsmodels.formula.api import ols
import statsmodels.api as sm
from sklearn.preprocessing import StandardScaler, LabelEncoder 
from sklearn import metrics
from sklearn.tree import DecisionTreeClassifier 
from sklearn.model_selection import train_test_split, cross_val_predict, cross_val_score
import scipy.stats as stats
import statsmodels.formula.api as sm
import toolz
import numpy as np
!pip3 install requests beautifulsoup4
import requests
from bs4 import BeautifulSoup as bs4
import toolz
import matplotlib.pyplot as plt

#combines extracted datasets from paimon.moe to create one big dataset
d1 = pd.read_excel('paimonmoe_wish_history.xlsx')
data1 = pd.DataFrame(d1, columns=['Type', 'Name', 'Time', '⭐', 'Pity', '#Roll', 'Group', 'Banner'])
d2 = pd.read_excel('paimonmoe_wish_history1.xlsx')
data2 = pd.DataFrame(d2, columns=['Type', 'Name', 'Time', '⭐', 'Pity', '#Roll', 'Group', 'Banner'])
d3 = pd.read_excel('paimonmoe_wish_history2.xlsx')
data3 = pd.DataFrame(d3, columns=['Type', 'Name', 'Time', '⭐', 'Pity', '#Roll', 'Group', 'Banner'])
d4 = pd.read_excel('paimonmoe_wish_history3.xlsx')
data4 = pd.DataFrame(d4, columns=['Type', 'Name', 'Time', '⭐', 'Pity', '#Roll', 'Group', 'Banner'])
d5 = pd.read_excel('paimonmoe_wish_history4.xlsx')
data5 = pd.DataFrame(d5, columns=['Type', 'Name', 'Time', '⭐', 'Pity', '#Roll', 'Group', 'Banner'])
d6 = pd.read_excel('paimonmoe_wish_history5.xlsx')
data6 = pd.DataFrame(d6, columns=['Type', 'Name', 'Time', '⭐', 'Pity', '#Roll', 'Group', 'Banner'])
d7 = pd.read_excel('paimonmoe_wish_history6.xlsx')
data7 = pd.DataFrame(d7, columns=['Type', 'Name', 'Time', '⭐', 'Pity', '#Roll', 'Group', 'Banner'])
d8 = pd.read_excel('paimonmoe_wish_history7.xlsx')
data8 = pd.DataFrame(d8, columns=['Type', 'Name', 'Time', '⭐', 'Pity', '#Roll', 'Group', 'Banner'])
d9 = pd.read_excel('paimonmoe_wish_history8.xlsx')
data9 = pd.DataFrame(d9, columns=['Type', 'Name', 'Time', '⭐', 'Pity', '#Roll', 'Group', 'Banner'])
d10 = pd.read_excel('paimonmoe_wish_history9.xlsx')
data10 = pd.DataFrame(d10, columns=['Type', 'Name', 'Time', '⭐', 'Pity', '#Roll', 'Group', 'Banner'])
d11 = pd.read_excel('paimonmoe_wish_history10.xlsx')
data11 = pd.DataFrame(d11, columns=['Type', 'Name', 'Time', '⭐', 'Pity', '#Roll', 'Group', 'Banner'])
dataset = pd.concat([data1, data2, data3, data4, data5, data6, data7, data8, data9, data10, data11])
dataset = dataset.loc[dataset['Time'] > '2021-11-02 00:00:00']
dataset = dataset.drop(['Group'], axis = 1)

dataset
Requirement already satisfied: requests in /opt/conda/lib/python3.9/site-packages (2.27.1)
Requirement already satisfied: beautifulsoup4 in /opt/conda/lib/python3.9/site-packages (4.10.0)
Requirement already satisfied: charset-normalizer~=2.0.0 in /opt/conda/lib/python3.9/site-packages (from requests) (2.0.10)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.9/site-packages (from requests) (1.26.8)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.9/site-packages (from requests) (2021.10.8)
Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.9/site-packages (from requests) (3.3)
Requirement already satisfied: soupsieve>1.2 in /opt/conda/lib/python3.9/site-packages (from beautifulsoup4) (2.3.1)
Out[38]:
Type Name Time ⭐ Pity #Roll Banner
0 Weapon Ferrous Shadow 2021-11-12 12:15:27 3 1 1 Moment of Bloom
1 Character Sayu 2021-11-16 12:06:50 4 2 2 Moment of Bloom
2 Weapon Emerald Orb 2021-11-16 12:47:55 3 1 3 Moment of Bloom
3 Weapon Debate Club 2021-11-16 12:57:23 3 1 4 Moment of Bloom
4 Weapon Ferrous Shadow 2021-11-16 13:13:06 3 1 5 Moment of Bloom
... ... ... ... ... ... ... ...
790 Weapon Skyrider Sword 2022-05-11 07:01:08 3 1 84 The Herons Court
791 Character Razor 2022-05-11 07:01:14 4 5 85 The Herons Court
792 Weapon Cool Steel 2022-05-11 07:01:54 3 1 86 The Herons Court
793 Weapon Cool Steel 2022-05-12 12:06:43 3 1 87 The Herons Court
794 Weapon Slingshot 2022-05-14 15:27:28 3 1 88 The Herons Court

4379 rows × 7 columns

Next, we made a dictionary in order to count the amount of times each banner is rolled in the datachart given. Then we would make a dataframe of the featured character and the total amount of rolls for their banner. After the dataset is made, we plotted a bar graph with the character on the x-axis and the # of wishes on the y-axis. This gives us an idea of how each banner performed in our data. Naturally, more rolls on a banner, the better the banner is since more rolls are willing to be sacrificed on it.

In [39]:
#plots the amount of rolls per character in our dataset via bar graph
import matplotlib.pyplot as plt
counter = {}
for c in dataset['Banner']:
    if c not in counter:
        counter[c] = 0
    counter[c] += 1

fig = plt.figure(figsize=(22, 15))
ax = fig.add_axes([0,0,1,1])
characters = ['Hu Tao', 'Eula', 'Albedo', 'Arataki Itto', 'Shenhe', 'Xiao', 'Zhongli', 'Ganyu', 'Yae Miko', 'Raiden Shogun', 
                  'Sangonomiya Kokomi', 'Kamisato Ayato', 'Venti', 'Kamisato Ayaka']
count = [227, 425, 113, 561, 413, 102, 63, 531, 431, 336, 52, 498, 49, 578]
ax.bar(characters, count)
ax.set_ylabel('# of Rolls on the Banner')
ax.set_xlabel('Character')
ax.set_title('Total Number of Rolls per Banner with a Given Character in Genshin')
plt.show()

Factors That We Think Affects Sales of a Promotional Character¶

We are going to explore trends between the following:

  • Characters' Japanese Voice Acting Cast (The Seiyuus)
    • Often when a character debuts on Genshin Impact and their voice acting cast are announced, there is a buzz around who they've voiced in popular anime or anime-affiliated things. We chose to pick up and try to evaluate this notion.
  • Drip Marketing
    • Term used when a character is first debuted by Genshin Impact's official social media accounts like Twitter.
    • Here we collect the number of likes on the post and the number of retweets for social media response measurement
  • Survey Data
    • We collected the responses of several people at random across Genshin Impact affiliated Discord Servers and peers who play the game.
    • We will see if the respective response options are consistent with the "global" data where applicable.
    • Other data was collected in the survey for reasons we'll explain later.

Web Scraping for Seiyuu Data¶

image.png We will start looking at factors that go into rolls starting with the seiyuus (voice actors/actresses) behind the characters.

We used MyAnimeList's Top 150 entries for web scraping and for the seiyuus who did not make the top 150, we manually keyed in the data. There were some hard ones to find entires for such as Arataki Itto's seiyuu who only has a music entry of his own on MAL which we substituted in for his ratings.

We'll implement this using Beautiful Soup since it was the easiest way to handle it.

In [40]:
#Web Scrape data from MyAnimeList.net's top 150 voice actors/actresses using Beautiful Soup
#Rating is measured by number of people who have "Favorited" that voice actor/actress. 
#Final data is stored in a pandas dataframe called seiyuu_table_final

url = 'https://myanimelist.net/people.php'

seiyuu_data = requests.get(url)

soup = bs4(seiyuu_data.content,'html.parser')
table = soup.find('table')

#each page has 50 entries, so we will repeat the process 3 times for 3*50 = 150 table entries
seiyuu_table = pd.DataFrame(columns = ['rank', 'name', 'birthday','favorites'], index = range(0,50))
row_marker = 0
for row in table.find_all('tr'):
    column_marker = 0
    columns = row.find_all('td')
    row_marker += 1
    for column in columns:
        if (row_marker > 1):
            seiyuu_table.iat[row_marker-2,column_marker] = column.get_text()
            column_marker += 1
seiyuu_table['rank'] = seiyuu_table['rank'].str.replace('\n','')
seiyuu_table['name'] = seiyuu_table['name'].str.replace('\n','')
seiyuu_table['birthday'] = seiyuu_table['birthday'].str.replace('\n','')
seiyuu_table['favorites'] = seiyuu_table['favorites'].str.replace('\n','')


#Entries 51-100 on MyAnimeList
url2 = 'https://myanimelist.net/people.php?limit=50'
seiyuu_data = requests.get(url2)
soup = bs4(seiyuu_data.content,'html.parser')
table = soup.find('table')

seiyuu_table2 = pd.DataFrame(columns = ['rank', 'name', 'birthday','favorites'], index = range(0,50))
row_marker = 0
for row in table.find_all('tr'):
    column_marker = 0
    columns = row.find_all('td')
    row_marker += 1
    for column in columns:
        if (row_marker > 1):
            seiyuu_table2.iat[row_marker-2,column_marker] = column.get_text()
            column_marker += 1
seiyuu_table2['rank'] = seiyuu_table2['rank'].str.replace('\n','')
seiyuu_table2['name'] = seiyuu_table2['name'].str.replace('\n','')
seiyuu_table2['birthday'] = seiyuu_table2['birthday'].str.replace('\n','')
seiyuu_table2['favorites'] = seiyuu_table2['favorites'].str.replace('\n','')


#Entries 101-150 on MyAnimeList
url3 = 'https://myanimelist.net/people.php?limit=100'
seiyuu_data = requests.get(url3)
soup = bs4(seiyuu_data.content,'html.parser')
table = soup.find('table')

seiyuu_table3 = pd.DataFrame(columns = ['rank', 'name', 'birthday','favorites'], index = range(0,50))
row_marker = 0
for row in table.find_all('tr'):
    column_marker = 0
    columns = row.find_all('td')
    row_marker += 1
    for column in columns:
        if (row_marker > 1):
            seiyuu_table3.iat[row_marker-2,column_marker] = column.get_text()
            column_marker += 1
seiyuu_table3['rank'] = seiyuu_table3['rank'].str.replace('\n','')
seiyuu_table3['name'] = seiyuu_table3['name'].str.replace('\n','')
seiyuu_table3['birthday'] = seiyuu_table3['birthday'].str.replace('\n','')
seiyuu_table3['favorites'] = seiyuu_table3['favorites'].str.replace('\n','')


#Merge the data from all 3 pages from MyAnimeList
seiyuu_table_final = pd.concat([seiyuu_table,seiyuu_table2, seiyuu_table3])

#Remove the Japanese text to preserve English text in the form "LastName, FirstName"
series = pd.Series(seiyuu_table_final['name'])
seiyuu_table_final['name'] = series.str.extract(pat = ('([A-z]+, [A-z]+)'))

seiyuu_table_final
Out[40]:
rank name birthday favorites
0 1 Kamiya, Hiroshi Jan 28, 1975 102,013
1 2 Hanazawa, Kana Feb 25, 1989 98,144
2 3 Miyano, Mamoru Jun 8, 1983 85,033
3 4 Kaji, Yuuki Sep 3, 1985 70,286
4 5 Miyazaki, Hayao Jan 5, 1941 66,127
... ... ... ... ...
45 146 Koyama, Rikiya Dec 18, 1963 4,114
46 147 Furukawa, Makoto Sep 29, 1989 4,062
47 148 Penkin, Kevin May 22, 1992 4,040
48 149 Morikawa, Toshiyuki Jan 26, 1967 4,030
49 150 Tezuka, Osamu Nov 3, 1928 4,006

150 rows × 4 columns

In [4]:
#Web Scrape Japanese voice acting cast data for Genshin Impact Characters
url4 = 'https://gamewith.net/genshin-impact/article/show/22638'
genshin_seiyuu_data = requests.get(url4)
soup = bs4(genshin_seiyuu_data.content, 'html.parser')
table = soup.find('table')
table

genshin_seiyuu_table = pd.DataFrame(columns = ['Character', 'Seiyuu'], index = range(0,27))
row_marker = 0
for row in table.find_all('tr'):
    column_marker = 0
    columns = row.find_all('td')
    row_marker += 1
    for column in columns:
        if (row_marker > 1):
            genshin_seiyuu_table.iat[row_marker-2,column_marker] = column.get_text()
            column_marker += 1

seiyuu_dict = {'Character':['Hu Tao', 'Eula', 'Albedo', 'Arataki Itto', 'Shenhe', 'Xiao', 'Zhongli', 'Ganyu', 'Yae Miko', 'Raiden Shogun', 
                  'Sangonomiya Kokomi', 'Kamisato Ayato', 'Venti', 'Kamisato Ayaka', 'Yelan'], 'Seiyuu': ['', '', '', '', '', '', '', '', '', '', '', '', '', '', ''], 
               '#MAL Favorites'  :['', '', '', '', '', '', '', '', '', '', '', '', '', '', '']}
seiyuu_df = pd.DataFrame(seiyuu_dict)

series = pd.Series(genshin_seiyuu_table['Seiyuu'])
genshin_seiyuu_table['Seiyuu'] = series.str.extract(pat = ('(JP: [A-z]+ [A-z]+)'))
genshin_seiyuu_table['Seiyuu'] = genshin_seiyuu_table['Seiyuu'].str.replace('JP: ', '')
genshin_seiyuu_table['Seiyuu'] = genshin_seiyuu_table['Seiyuu'].str.replace(' ', ', ')

#Format the voice actor/actress names to LastName, FirstName
for index, vas in genshin_seiyuu_table.iterrows():
    if vas['Character'] == 'Hu Tao':
        seiyuu_df.loc[0, 'Seiyuu'] = vas['Seiyuu']
    if vas['Character'] == 'Eula':
        seiyuu_df.loc[1,'Seiyuu'] = 'Satou, Rina'
    if vas['Character'] == 'Albedo':
        seiyuu_df.loc[2,'Seiyuu'] = 'Nojima, Kenji'
    if vas['Character'] == 'Itto':
        seiyuu_df.loc[3,'Seiyuu'] = vas['Seiyuu']
    if vas['Character'] == 'Shenhe':
        seiyuu_df.loc[4,'Seiyuu'] = vas['Seiyuu']
    if vas['Character'] == 'Xiao':
        seiyuu_df.loc[5,'Seiyuu'] = 'Matsuoka, Yoshitsugu'
    if vas['Character'] == 'Zhongli':
        seiyuu_df.loc[6,'Seiyuu'] = 'Maeno, Tomoaki'
    if vas['Character'] == 'Ganyu':
        seiyuu_df.loc[7,'Seiyuu'] = vas['Seiyuu']
    if vas['Character'] == 'Yae Miko':
        seiyuu_df.loc[8,'Seiyuu'] = 'Sakura, Ayane'
    if vas['Character'] == 'Raiden Shogun':
        seiyuu_df.loc[9, 'Seiyuu'] = 'Sawashiro, Miyuki'
    if vas['Character'] == 'Kokomi':
        seiyuu_df.loc[10, 'Seiyuu'] = 'Mimori, Suzuko' 
    if vas['Character'] == 'Ayato':
        seiyuu_df.loc[11, 'Seiyuu'] = vas['Seiyuu']
    if vas['Character'] == 'Venti':
        seiyuu_df.loc[12, 'Seiyuu'] = vas['Seiyuu']
    if vas['Character'] == 'Ayaka':
        seiyuu_df.loc[13, 'Seiyuu'] = vas['Seiyuu']
        
seiyuu_df.loc[14, 'Seiyuu'] = 'Sakamoto, Maaya'
    
seiyuu_df
Out[4]:
Character Seiyuu #MAL Favorites
0 Hu Tao Takahashi, Rie
1 Eula Satou, Rina
2 Albedo Nojima, Kenji
3 Arataki Itto Nishikawa, Takanori
4 Shenhe Kawasumi, Ayako
5 Xiao Matsuoka, Yoshitsugu
6 Zhongli Maeno, Tomoaki
7 Ganyu Ueda, Reina
8 Yae Miko Sakura, Ayane
9 Raiden Shogun Sawashiro, Miyuki
10 Sangonomiya Kokomi Mimori, Suzuko
11 Kamisato Ayato Ishida, Akira
12 Venti Ayumu, Murase
13 Kamisato Ayaka Hayami, Saori
14 Yelan Sakamoto, Maaya
In [5]:
# Now we will fill in the MAL Favorites Data (some are manual) 
# and create a new column to indicate whether a character's seiyuu is in the MyAnimeList Top 150 or not 
In [6]:
seiyuu_df['MAL Top 150'] = False
for index, rows in seiyuu_table_final.iterrows():
    if(rows[1] == 'Takahashi, Rie'):
        seiyuu_df.loc[0, '#MAL Favorites'] = rows[3]
        seiyuu_df.loc[0, 'MAL Top 150'] = True
    elif (rows[1] == 'Sawashiro, Miyuki'):
        seiyuu_df.loc[9, '#MAL Favorites'] = rows[3]
        seiyuu_df.loc[9, 'MAL Top 150'] = True
    elif(rows[1] == 'Matsuoka, Yoshitsugu'):
        seiyuu_df.loc[5,'#MAL Favorites'] = rows[3]
        seiyuu_df.loc[5, 'MAL Top 150'] = True
    elif(rows[1] == 'Sakura, Ayane'): 
        seiyuu_df.loc[8,'#MAL Favorites'] = rows[3]
        seiyuu_df.loc[8, 'MAL Top 150'] = True
    elif(rows[1] == 'Maeno, Tomoaki'): 
        seiyuu_df.loc[6,'#MAL Favorites'] = rows[3]
        seiyuu_df.loc[6, 'MAL Top 150'] = True
    elif(rows[1] == 'Sato, Rina'): 
        seiyuu_df.loc[1,'#MAL Favorites'] = rows[3]
        seiyuu_df.loc[1, 'MAL Top 150'] = True
    elif(rows[1] == 'Nojima, Kenji'): 
        seiyuu_df.loc[2,'#MAL Favorites'] = rows[3]
        seiyuu_df.loc[2, 'MAL Top 150'] = True
    elif(rows[1] == 'Nishikawa, Takanori'): 
        seiyuu_df.loc[3,'#MAL Favorites'] = rows[3]
        seiyuu_df.loc[3, 'MAL Top 150'] = True
    elif(rows[1] == 'Kawasumi, Ayako'): 
        seiyuu_df.loc[4,'#MAL Favorites'] = rows[3]
        seiyuu_df.loc[4, 'MAL Top 150'] = True
    elif(rows[1] == 'Ueda, Reina'): 
        seiyuu_df.loc[7,'#MAL Favorites'] = rows[3]
        seiyuu_df.loc[7, 'MAL Top 150'] = True
    elif(rows[1] == 'Mimori, Suzuko'): 
        seiyuu_df.loc[10,'#MAL Favorites'] = rows[3]
        seiyuu_df.loc[10, 'MAL Top 150'] = True
    elif(rows[1] == 'Ishida, Akira'): 
        seiyuu_df.loc[11,'#MAL Favorites'] = rows[3]
        seiyuu_df.loc[11, 'MAL Top 150'] = True
    elif(rows[1] == 'Ayumu, Murase'): 
        seiyuu_df.loc[12,'#MAL Favorites'] = rows[3]
        seiyuu_df.loc[12, 'MAL Top 150'] = True
    elif(rows[1] == 'Hayami, Saori'): 
        seiyuu_df.loc[13,'#MAL Favorites'] = rows[3]
        seiyuu_df.loc[13, 'MAL Top 150'] = True
    elif(rows[1] == 'Sakamoto, Maaya'): 
        seiyuu_df.loc[14,'#MAL Favorites'] = rows[3]
        seiyuu_df.loc[14, 'MAL Top 150'] = True
        
#Fill in missing data for #MAL favorites on those who are not in the Top 150 
seiyuu_df.loc[1,'#MAL Favorites'] = '2,670'
seiyuu_df.loc[7,'#MAL Favorites'] = '3417'
seiyuu_df.loc[2,'#MAL Favorites'] = '965'
seiyuu_df.loc[3,'#MAL Favorites'] = '593' #This is from T.M Revolution which is the only data available about Takanori Nishikawa on MyAnimeList
seiyuu_df.loc[6,'#MAL Favorites'] = '3,662'
seiyuu_df.loc[10,'#MAL Favorites'] = '3,203'
seiyuu_df.loc[12,'#MAL Favorites'] = '5,262'




seiyuu_df
Out[6]:
Character Seiyuu #MAL Favorites MAL Top 150
0 Hu Tao Takahashi, Rie 45,055 True
1 Eula Satou, Rina 2,670 False
2 Albedo Nojima, Kenji 965 False
3 Arataki Itto Nishikawa, Takanori 593 False
4 Shenhe Kawasumi, Ayako 4,470 True
5 Xiao Matsuoka, Yoshitsugu 37,623 True
6 Zhongli Maeno, Tomoaki 3,662 False
7 Ganyu Ueda, Reina 3417 False
8 Yae Miko Sakura, Ayane 17,141 True
9 Raiden Shogun Sawashiro, Miyuki 37,907 True
10 Sangonomiya Kokomi Mimori, Suzuko 3,203 False
11 Kamisato Ayato Ishida, Akira 11,323 True
12 Venti Ayumu, Murase 5,262 False
13 Kamisato Ayaka Hayami, Saori 53,529 True
14 Yelan Sakamoto, Maaya 16,320 True
In [7]:
# Genshin Sales Data from Japan and China

# Japan Data will be manually added from https://game-i.daa.jp/?%E3%82%AC%E3%83%81%E3%83%A3%E5%88%86%E6%9E%90%2F%E5%8E%9F%E7%A5%9E 
# Formatting the data via web scraping is difficult due to inaccurate English translations. 
# Data in this website represents in billions of yen. For example, 22.71 G is 2.271 billion Yen (due to Japanese language) 

# NOTE: This is data for iOS sales only. We are currently unable to access data for PS4,PS5 and PC sales since the company does not disclose
#        this data. 

sales_JP = {} 
# Sales in billions of yen.
sales_JP['TheHeron\'s Court'] = 2.271
sales_JP['Azure Excursion + Ballad in Goblets'] = 3.201
sales_JP['Reign of Serenity + Drifting Luminescence'] = 2.027
sales_JP['Everbloom Violet'] = 1.733
sales_JP['Gentry of Hermitage + Adrift in the Harbor'] = 2.394
sales_JP['Invitation to Mundane Life + The Transcendant One Returns'] = 2.347
sales_JP['Oni\'s Royale'] = 1.071
sales_JP[' Secretum Secretorum + Born of Ocean Swell'] = 1.667
sales_JP['Moment of Bloom'] = 2.468

# ----------------------------------------------
def f(yen):
    Yen_to_1USD = 0.0077 #As of 5/15/2022 
    # we account for the fact that each entry is a certain billion amount of yen by multiplying by 10^9
    return int(Yen_to_1USD * (yen * (1000000000)))

#We convert our sales from Japanese Yen to US Dollar for the sake of consistency
sales_JP = toolz.valmap(f,sales_JP)
print('Japan')
sales_JP = pd.DataFrame(sales_JP.items(), columns = ['Banner','Sales Japan (in USD)'])
sales_JP
Japan
Out[7]:
Banner Sales Japan (in USD)
0 TheHeron's Court 17486700
1 Azure Excursion + Ballad in Goblets 24647700
2 Reign of Serenity + Drifting Luminescence 15607900
3 Everbloom Violet 13344100
4 Gentry of Hermitage + Adrift in the Harbor 18433800
5 Invitation to Mundane Life + The Transcendant ... 18071900
6 Oni's Royale 8246700
7 Secretum Secretorum + Born of Ocean Swell 12835900
8 Moment of Bloom 19003600
In [8]:
# China Data will be manually added from https://www.genshinlab.com/genshin-impact-revenue-chart/
# This site is most commonly referred to when gauging character sales and is regularly maintained 
# by certified people across the Genshin community.

#This data from China is on iOS only. 

sales_CN = {} 
# Sales in USD as of 5/15/2022
sales_CN['The Heron\'s Court'] = 19897071
sales_CN['Azure Excursion + Ballad in Goblets'] = 22767455
sales_CN['Reign of Serenity + Drifting Luminescence'] = 33560259
sales_CN['Everbloom Violet'] = 15110264
sales_CN['Gentry of Hermitage + Adrift in the Harbor'] = 26780298
sales_CN['Invitation to Mundane Life + The Transcendant One Returns'] = 16994406
sales_CN['Oni\'s Royale'] = 13404072
sales_CN[' Secretum Secretorum + Born of Ocean Swell'] = 17026066
sales_CN['Moment of Bloom'] = 25226952

sales_CN = pd.DataFrame(sales_CN.items(), columns = ['Banner','Sales China iOS (in USD)'])
print('China')
sales_CN
China
Out[8]:
Banner Sales China iOS (in USD)
0 The Heron's Court 19897071
1 Azure Excursion + Ballad in Goblets 22767455
2 Reign of Serenity + Drifting Luminescence 33560259
3 Everbloom Violet 15110264
4 Gentry of Hermitage + Adrift in the Harbor 26780298
5 Invitation to Mundane Life + The Transcendant ... 16994406
6 Oni's Royale 13404072
7 Secretum Secretorum + Born of Ocean Swell 17026066
8 Moment of Bloom 25226952
In [9]:
# Now we are going to make plots for Japan and China sales data versus the MyAnimeList favorite rating for their seiyuus. 
# Double banners will have the voice actors' favorites summed together. 
# For example: Azure Excursion + Ballad in Goblets would have Akira Ishida and Ayumu Murase's favorite ratings summed together on the x axis.
# Sales will always be on the y axis

plot_data = pd.DataFrame(columns = ['Banner', '#Seiyuu Favorites'], index = range(0,9))

plot_data['Banner'] = sales_CN['Banner']

plot_data['Characters'] = ['Kamisato Ayaka', 'Kamisato Ayato + Venti', 'Raiden Shogun + Sangonomiya Kokomi', 'Yae Miko','Zhongli + Ganyu', 
                          'Xiao + Shenhe', 'Arataki Itto', 'Albedo + Eula', 'Hu Tao']
seiyuu_df['#MAL Favorites'] = seiyuu_df['#MAL Favorites'].str.replace(',','')
#print(seiyuu_df)
plot_data.loc[0,'#Seiyuu Favorites'] = int(seiyuu_df.loc[13,'#MAL Favorites'])
plot_data.loc[1,'#Seiyuu Favorites'] = int(seiyuu_df.loc[11,'#MAL Favorites']) + int(seiyuu_df.loc[12,'#MAL Favorites'])
plot_data.loc[2,'#Seiyuu Favorites'] = int(seiyuu_df.loc[9,'#MAL Favorites']) + int(seiyuu_df.loc[10,'#MAL Favorites'])
plot_data.loc[3,'#Seiyuu Favorites'] = int(seiyuu_df.loc[8,'#MAL Favorites'])
plot_data.loc[4,'#Seiyuu Favorites'] = int(seiyuu_df.loc[6,'#MAL Favorites']) + int(seiyuu_df.loc[7,"#MAL Favorites"])
plot_data.loc[5,'#Seiyuu Favorites'] = int(seiyuu_df.loc[5,'#MAL Favorites']) + int(seiyuu_df.loc[4,'#MAL Favorites'])
plot_data.loc[6,'#Seiyuu Favorites'] = int(seiyuu_df.loc[3,'#MAL Favorites'])
plot_data.loc[7,'#Seiyuu Favorites'] = int(seiyuu_df.loc[1,'#MAL Favorites']) + int(seiyuu_df.loc[2,'#MAL Favorites'])
plot_data.loc[8,'#Seiyuu Favorites'] = int(seiyuu_df.loc[0,'#MAL Favorites']) 

plot_data['Japan Sales in USD'] = sales_JP['Sales Japan (in USD)'] 
plot_data['China Sales in USD'] = sales_CN['Sales China iOS (in USD)']
#print(plot_data)

# Keep a column of combined iOS sales in Japan and China
plot_data['China and Japan Combined Sales'] = 0
for index, row in plot_data.iterrows():
    plot_data.loc[index,'China and Japan Combined Sales'] += row[3] + row[4]

plot_data
Out[9]:
Banner #Seiyuu Favorites Characters Japan Sales in USD China Sales in USD China and Japan Combined Sales
0 The Heron's Court 53529 Kamisato Ayaka 17486700 19897071 37383771
1 Azure Excursion + Ballad in Goblets 16585 Kamisato Ayato + Venti 24647700 22767455 47415155
2 Reign of Serenity + Drifting Luminescence 41110 Raiden Shogun + Sangonomiya Kokomi 15607900 33560259 49168159
3 Everbloom Violet 17141 Yae Miko 13344100 15110264 28454364
4 Gentry of Hermitage + Adrift in the Harbor 7079 Zhongli + Ganyu 18433800 26780298 45214098
5 Invitation to Mundane Life + The Transcendant ... 42093 Xiao + Shenhe 18071900 16994406 35066306
6 Oni's Royale 593 Arataki Itto 8246700 13404072 21650772
7 Secretum Secretorum + Born of Ocean Swell 3635 Albedo + Eula 12835900 17026066 29861966
8 Moment of Bloom 45055 Hu Tao 19003600 25226952 44230552
In [29]:
#constructing plot showing correlation between sales data in china & japan vs rolls for our character banner dataset
newcount = [578, 547, 388, 431, 594, 515, 561, 538, 454]
df = pd.DataFrame(newcount)
annotations = plot_data['Characters'].to_numpy()
X_Plots = df[0].to_numpy()
Y_Plots =  plot_data['China and Japan Combined Sales'].to_numpy()
z = np.polyfit(x = X_Plots, y = Y_Plots, deg=1)
f = np.poly1d(z)
x_new = np.linspace(X_Plots.min(), X_Plots.max(), 100)
y_new = f(x_new)
plt.figure(figsize = (20,10))
plt.plot(X_Plots, Y_Plots,'o',x_new,y_new)
plt.scatter(X_Plots,Y_Plots, s = 500, color = "red")
plt.xlabel("Rolls on Each Banner")
plt.ylabel("Sales Data in China and Japan (USD)")
plt.title("Sales Data in China and Japan (USD) vs Rolls for the Character of Banner ",fontsize=40)
for i, label in enumerate(annotations):
    plt.annotate(label, (X_Plots[i], Y_Plots[i]))
plt.show()

Here is a graph that displays the rolls on each banner and how it compares to the sales data in China and Japan(USD). This data is however undesireable because common sense would tell us that more rolls on a banner should lead to more sales. There is a clear discrepancy between our data and the total sales data which is likely due to not having enough data. Therefore this line of regression is unrepresentative of what the data should be like.

Twitter Data¶

image.png Twitter was our social media platform of choice when it came to scraping the so called drip marketing post data. We chose to use English Twitter because there was the most drip marketing posts available. The goal here is to be able to get a general idea of a character's popularity based on the number of likes and retweets on their intial reveal post in which we value by adding these two quantities together. This way, we would later be able to use this data for analyis.

Two minor drawbacks here were that none of us are certified on the Twitter Developer Platform to scrape the data straight off of Twitter using tweepy and that not all drip marketing posts were available on Genshin Impact's Twitter. One of us applied to become a developer, but Twitter did not respond in time, so what we ended up doing was manually setting the twitter data of the characters drip market posts into an excel file and scraping the data from there. Yes, we quite literally scrolled through every Twitter post to gather information. It is accurate as of 5/14/2022.

As pointed out several times in this tutorial, Hoyoverse often deletes data that is more than 6 months recent. Any data that exists for more than 6 months in the past may be inconsistent for the sake of calculations. Hence, for those characters who did not have a drip market post available, we marked all of their columns as missing ('NaN'). We also collected the drip market data for a character that is not out yet by the name of Yelan which we will use for data analysis later on. Yelan was due to be Genshin Impact's newest 5* promotional character out in the version 2.7 update, but she has not come out yet due to Shanghai COVID lockdown delaying the release of the update. The update was actually supposed to be on May 10, 2022 but Hoyoverse was forced into extending The Heron's Court wish banner until further notice.

Drip Market Post Example - Sangonomiya Kokomi

In [54]:
#extract genshin 'drip market' twitter data that is available
t = pd.read_excel('twitterdata.xlsx')
twitter = pd.DataFrame(t, columns = ['Character', 'Date', 'Likes', 'Retweets', 'Post Activity'])
twitter['Post Activity'] = twitter['Likes'] + twitter['Retweets']
twitter
Out[54]:
Character Date Likes Retweets Post Activity
0 Hu Tao NaT NaN NaN NaN
1 Eula 2021-05-11 63503.0 6873.0 70376.0
2 Albedo NaT NaN NaN NaN
3 Arataki Itto 2021-10-11 188590.0 34457.0 223047.0
4 Shenhe 2021-11-22 207072.0 36283.0 243355.0
5 Xiao NaT NaN NaN NaN
6 Zhongli NaT NaN NaN NaN
7 Ganyu NaT NaN NaN NaN
8 Yae Miko 2022-12-31 343000.0 73082.0 416082.0
9 Raiden Shogun 2021-07-22 88703.0 14687.0 103390.0
10 Sangonomiya Kokomi 2021-07-22 70249.0 10448.0 80697.0
11 Kamisato Ayato 2022-02-04 199244.0 45039.0 244283.0
12 Venti NaT NaN NaN NaN
13 Kamisato Ayaka 2021-06-07 103889.0 15472.0 119361.0
14 Yelan 2022-03-28 156659.0 24525.0 181184.0
In [55]:
#creates a new twitter dataframe to be worked with
twitter2 = twitter.dropna()

twitter2.loc[9, 'Character'] = 'Raiden Shogun + Sangonomiya Kokomi'
twitter2.loc[9, 'Post Activity'] = twitter2.loc[9, 'Post Activity'] + twitter2.loc[10, 'Post Activity']
twitter2 = twitter2.drop([10])
twitter2 = twitter2.drop(['Likes'], axis = 1)
twitter2 = twitter2.drop(['Retweets'], axis = 1)
twitter2 = twitter2.astype({'Post Activity': 'int'})

twitter2 = twitter2.reindex([13, 11, 9, 8, 4, 3, 1])
predict = twitter2

twitter2
Out[55]:
Character Date Post Activity
13 Kamisato Ayaka 2021-06-07 119361
11 Kamisato Ayato 2022-02-04 244283
9 Raiden Shogun + Sangonomiya Kokomi 2021-07-22 184087
8 Yae Miko 2022-12-31 416082
4 Shenhe 2021-11-22 243355
3 Arataki Itto 2021-10-11 223047
1 Eula 2021-05-11 70376
In [43]:
#pie chart creation for the dataframe above
pie = np.array(twitter2['Post Activity'])
labelz = twitter2['Character']

def value(val):
    return '{:.0f}%\n({:.0f})'.format(val, np.round(val/100.*pie.sum(),0))

plt.pie(pie, labels = labelz, autopct=value)
plt.show()

Looking at the pie chart of our twitter data above, there's some very interesting things to notice. For starters, Yae Miko seems to be by far the most popular character and Eula & Kamisato Ayaka seem to be the least popular characters. As for Eula & Kamisato Ayaka, they are the two oldest drip market posts in our dataset, so because of this our data is probably skewed as to what Genshin's Twitter following was like at the time of each drip market post. Assuming that their Twitter following has increased with time, the amount of likes & retweets on a post should also increase with time. Also, as supported by the sales data above, Raiden Shogun is the highest revenue character in the game which should correlate to most popular, so her being in the middle of the pack in this dataset is also attributed to her drip marketing post being on the older side. On the flip side of things, Yae Miko is the most popular character here, but she is actually not the newest drip market post, as for example Kamisato Ayato is newer than her, but she still seems to be way more popular than him as he's about average popularity. In conclusion, the date of the drip market post affects the posts likes and retweets.

Sadly, there is no dataset available for Genshin's Twitter following throughout time to scale their followers by date to these values for more accuracy... otherwise, this would be taking place next. This data will still be used to analyze data to check if there's still any possible correlations.

Survey Data¶

image.png Another way to extract data that we came up with was to create a Google Form survey. We shared this form to about 20 different Discord servers of Genshin players to gather surveyers to fill the data. We made sure we posted it within the Mains servers for all the characters in our dataset to level out any biases. The purpose of this was to get a local sample in relevance to genshin wishes. For each character in our dataset from Hu Tao to Kamisato Ayaka, we ask if the surveyer rolled on their banner or not. If they did, they are prompted to check off as many of the following reasons as to why: Character Design, Gameplay, Meta Relevance, In-Game Lore, & Voice Acting Cast. If not, they don't get asked anything else about that specific character.

This survey was great because we not only get an idea of the % of people that rolled for these characters, but also insight as to the reasons why. It was downloaded as an excel file and the data was then scraped to accurately find these things.

We tried our best to keep this survey sample unbiased by sharing it to every Genshin Impact affiliated public Discord server. For example, simply sharing the data in "Ganyu Mains" would skew the data in favor of Ganyu. To counter this, we had to share the survey across servers dedicated to all characters and some neutral servers such as "Keqing Mains" which is commonly known as KQM.

If you're interested in learning meta-related matters for Genshin Impact, we recommend joining the Keqing Mains Discord and checking out their theorycrafting libraries. Genshin Impact theorycrafting is massive as it often demonstrates the productivity and efficiency of a character's meta.

We have not done a fully meta-based analysis on our project because it is an obvious fact among literally every Genshin player (no matter how much they actually understand the game) you ask that "Ganyu and Hu Tao are the best". This would skew our analysis heavily and we wish to avoid such a predicament.

Here is an example of a "meta" team if you are interested: https://youtu.be/YEMUdnhU7A4

The data from this survey is what was gathered up until 5/15/22.

Our Survey

In [44]:
#extracts data from a survey about genshin wishes we created
t1 = pd.read_excel('Genshin_Form.xlsx')
form = pd.DataFrame(t1, columns = ['Did you wish for Hu Tao?', 'Why did you wish for Hu Tao?',
                                  'Did you wish for Eula?', 'Why did you wish for Eula?', 
                                  'Did you wish for Albedo?', 'Why did you wish for Albedo?',
                                   'Did you wish for Arataki Itto?', 'Why did you wish for Arataki Itto?',
                                   'Did you wish for Shenhe?', 'Why did you wish for Shenhe?', 
                                   'Did you wish for Xiao?', 'Why did you wish for Xiao?', 
                                   'Did you wish for Zhongli?', 'Why did you wish for Zhongli?',
                                   'Did you wish for Ganyu?', 'Why did you wish for Ganyu?',
                                   'Did you wish for Yae Miko?', 'Why did you wish for Yae Miko?',
                                   'Did you wish for Raiden Shogun?', 'Why did you wish for Raiden Shogun?',
                                   'Did you wish for Sangonomiya Kokomi?', 'Why did you wish for Sangonomiya Kokomi?',
                                   'Did you wish for Kamisato Ayato?', 'Why did you wish for Kamisato Ayato?',
                                   'Did you wish for Venti?', 'Why did you wish for Venti?', 
                                   'Did you wish for Kamisato Ayaka?', 'Why did you wish for Kamisato Ayaka?'])
form
Out[44]:
Did you wish for Hu Tao? Why did you wish for Hu Tao? Did you wish for Eula? Why did you wish for Eula? Did you wish for Albedo? Why did you wish for Albedo? Did you wish for Arataki Itto? Why did you wish for Arataki Itto? Did you wish for Shenhe? Why did you wish for Shenhe? ... Did you wish for Raiden Shogun? Why did you wish for Raiden Shogun? Did you wish for Sangonomiya Kokomi? Why did you wish for Sangonomiya Kokomi? Did you wish for Kamisato Ayato? Why did you wish for Kamisato Ayato? Did you wish for Venti? Why did you wish for Venti? Did you wish for Kamisato Ayaka? Why did you wish for Kamisato Ayaka?
0 Yes Character Design, Voice Acting Cast Yes Character Design, Gameplay No NaN No NaN No NaN ... No NaN Yes Character Design, Gameplay, Voice Acting Cast Yes Character Design, Gameplay, In-Game Lore, Voic... Yes Meta Relevance Yes Character Design, Gameplay, Meta Relevance, In...
1 Yes Character Design, Gameplay, Meta Relevance, Vo... No NaN No NaN No NaN No NaN ... Yes Character Design, Meta Relevance No NaN No NaN Yes Character Design, Gameplay, Meta Relevance Yes Character Design, Gameplay
2 Yes Gameplay, Meta Relevance Yes Gameplay, Meta Relevance No NaN No NaN Yes Character Design, Gameplay, Meta Relevance ... Yes Character Design, Gameplay, Meta Relevance Yes Character Design, Gameplay, Meta Relevance, In... No NaN No NaN Yes Gameplay, Meta Relevance
3 Yes Character Design, Gameplay, Voice Acting Cast Yes Character Design, Gameplay No NaN No NaN Yes Character Design ... Yes Character Design, Gameplay, Meta Relevance, In... Yes Character Design, Gameplay, Meta Relevance, In... No NaN No NaN No NaN
4 Yes Character Design, Gameplay, Meta Relevance, In... No NaN Yes Character Design, Gameplay, Meta Relevance, In... No NaN No NaN ... Yes Character Design, Gameplay, Meta Relevance, In... Yes Character Design, Gameplay, Meta Relevance, In... No NaN Yes In-Game Lore Yes Character Design, Gameplay, Meta Relevance, In...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
97 Yes Gameplay, Voice Acting Cast Yes Character Design, Gameplay, Voice Acting Cast Yes Gameplay No NaN Yes Character Design, Gameplay, Voice Acting Cast ... Yes Character Design, Gameplay, Meta Relevance, In... Yes Gameplay, Meta Relevance Yes Gameplay, In-Game Lore, Voice Acting Cast Yes Gameplay, In-Game Lore Yes Character Design, Gameplay, Meta Relevance, In...
98 No NaN No NaN No NaN No NaN No NaN ... No NaN Yes Character Design, Gameplay, Meta Relevance, In... No NaN No NaN Yes Character Design
99 Yes Character Design, Gameplay, Meta Relevance, Vo... No NaN No NaN Yes Character Design, Gameplay, Meta Relevance, In... No NaN ... No NaN No NaN Yes Character Design, Gameplay No NaN Yes Character Design, Gameplay, Meta Relevance, In...
100 Yes Character Design, Gameplay, Meta Relevance, In... No NaN No NaN No NaN Yes Character Design, Gameplay, In-Game Lore ... Yes Character Design, Gameplay, Meta Relevance, In... Yes Character Design, Gameplay, Meta Relevance No NaN No NaN Yes Character Design, Gameplay, Meta Relevance, In...
101 Yes Character Design Yes Character Design Yes Character Design No NaN Yes In-Game Lore ... Yes Character Design, Meta Relevance Yes Character Design, In-Game Lore No NaN Yes Character Design, Gameplay, Meta Relevance, In... No NaN

102 rows × 28 columns

In [13]:
#tracks the counts for each possible selection in the survey per character 
counterHuTao = {'Yes': 0, 'No': 0, 'Character Design': 0, 'Gameplay': 0, 'Meta Relevance': 0,
                     'In-Game Lore': 0, 'Voice Acting Cast': 0}
for c in form['Did you wish for Hu Tao?']:
    counterHuTao[c] += 1
whyWish = form['Why did you wish for Hu Tao?']
whyWish = whyWish.dropna()
for c in whyWish:
    list = c.split(', ')
    for i in list:
        counterHuTao[i] += 1

counterEula = {'Yes': 0, 'No': 0, 'Character Design': 0, 'Gameplay': 0, 'Meta Relevance': 0,
                     'In-Game Lore': 0, 'Voice Acting Cast': 0}
for c in form['Did you wish for Eula?']:
    counterEula[c] += 1
whyWish = form['Why did you wish for Eula?']
whyWish = whyWish.dropna()
for c in whyWish:
    list = c.split(', ')
    for i in list:
        counterEula[i] += 1

counterAlbedo = {'Yes': 0, 'No': 0, 'Character Design': 0, 'Gameplay': 0, 'Meta Relevance': 0,
                     'In-Game Lore': 0, 'Voice Acting Cast': 0}
for c in form['Did you wish for Albedo?']:
    counterAlbedo[c] += 1
whyWish = form['Why did you wish for Albedo?']
whyWish = whyWish.dropna()
for c in whyWish:
    list = c.split(', ')
    for i in list:
        counterAlbedo[i] += 1
        
counterAratakiItto = {'Yes': 0, 'No': 0, 'Character Design': 0, 'Gameplay': 0, 'Meta Relevance': 0,
                     'In-Game Lore': 0, 'Voice Acting Cast': 0}
for c in form['Did you wish for Arataki Itto?']:
    counterAratakiItto[c] += 1
whyWish = form['Why did you wish for Arataki Itto?']
whyWish = whyWish.dropna()
for c in whyWish:
    list = c.split(', ')
    for i in list:
        counterAratakiItto[i] += 1

counterShenhe = {'Yes': 0, 'No': 0, 'Character Design': 0, 'Gameplay': 0, 'Meta Relevance': 0,
                     'In-Game Lore': 0, 'Voice Acting Cast': 0}
for c in form['Did you wish for Shenhe?']:
    counterShenhe[c] += 1
whyWish = form['Why did you wish for Shenhe?']
whyWish = whyWish.dropna()
for c in whyWish:
    list = c.split(', ')
    for i in list:
        counterShenhe[i] += 1

counterXiao = {'Yes': 0, 'No': 0, 'Character Design': 0, 'Gameplay': 0, 'Meta Relevance': 0,
                     'In-Game Lore': 0, 'Voice Acting Cast': 0}
for c in form['Did you wish for Xiao?']:
    counterXiao[c] += 1
whyWish = form['Why did you wish for Xiao?']
whyWish = whyWish.dropna()
for c in whyWish:
    list = c.split(', ')
    for i in list:
        counterXiao[i] += 1

counterZhongli = {'Yes': 0, 'No': 0, 'Character Design': 0, 'Gameplay': 0, 'Meta Relevance': 0,
                     'In-Game Lore': 0, 'Voice Acting Cast': 0}
for c in form['Did you wish for Zhongli?']:
    counterZhongli[c] += 1
whyWish = form['Why did you wish for Zhongli?']
whyWish = whyWish.dropna()
for c in whyWish:
    list = c.split(', ')
    for i in list:
        counterZhongli[i] += 1
    
counterGanyu = {'Yes': 0, 'No': 0, 'Character Design': 0, 'Gameplay': 0, 'Meta Relevance': 0,
                     'In-Game Lore': 0, 'Voice Acting Cast': 0}
for c in form['Did you wish for Ganyu?']:
    counterGanyu[c] += 1
whyWish = form['Why did you wish for Ganyu?']
whyWish = whyWish.dropna()
for c in whyWish:
    list = c.split(', ')
    for i in list:
        counterGanyu[i] += 1

counterYaeMiko = {'Yes': 0, 'No': 0, 'Character Design': 0, 'Gameplay': 0, 'Meta Relevance': 0,
                     'In-Game Lore': 0, 'Voice Acting Cast': 0}
for c in form['Did you wish for Yae Miko?']:
    counterYaeMiko[c] += 1
whyWish = form['Why did you wish for Yae Miko?']
whyWish = whyWish.dropna()
for c in whyWish:
    list = c.split(', ')
    for i in list:
        counterYaeMiko[i] += 1

counterRaidenShogun = {'Yes': 0, 'No': 0, 'Character Design': 0, 'Gameplay': 0, 'Meta Relevance': 0,
                     'In-Game Lore': 0, 'Voice Acting Cast': 0}
for c in form['Did you wish for Raiden Shogun?']:
    counterRaidenShogun[c] += 1
whyWish = form['Why did you wish for Raiden Shogun?']
whyWish = whyWish.dropna()
for c in whyWish:
    list = c.split(', ')
    for i in list:
        counterRaidenShogun[i] += 1

counterSangonomiyaKokomi = {'Yes': 0, 'No': 0, 'Character Design': 0, 'Gameplay': 0, 'Meta Relevance': 0,
                     'In-Game Lore': 0, 'Voice Acting Cast': 0}
for c in form['Did you wish for Sangonomiya Kokomi?']:
    counterSangonomiyaKokomi[c] += 1
whyWish = form['Why did you wish for Sangonomiya Kokomi?']
whyWish = whyWish.dropna()
for c in whyWish:
    list = c.split(', ')
    for i in list:
        counterSangonomiyaKokomi[i] += 1

counterKamisatoAyato = {'Yes': 0, 'No': 0, 'Character Design': 0, 'Gameplay': 0, 'Meta Relevance': 0,
                     'In-Game Lore': 0, 'Voice Acting Cast': 0}
for c in form['Did you wish for Kamisato Ayato?']:
    counterKamisatoAyato[c] += 1
whyWish = form['Why did you wish for Kamisato Ayato?']
whyWish = whyWish.dropna()
for c in whyWish:
    list = c.split(', ')
    for i in list:
        counterKamisatoAyato[i] += 1

counterVenti = {'Yes': 0, 'No': 0, 'Character Design': 0, 'Gameplay': 0, 'Meta Relevance': 0,
                     'In-Game Lore': 0, 'Voice Acting Cast': 0}
for c in form['Did you wish for Venti?']:
    counterVenti[c] += 1
whyWish = form['Why did you wish for Venti?']
whyWish = whyWish.dropna()
for c in whyWish:
    list = c.split(', ')
    for i in list:
        counterVenti[i] += 1

counterKamisatoAyaka = {'Yes': 0, 'No': 0, 'Character Design': 0, 'Gameplay': 0, 'Meta Relevance': 0,
                     'In-Game Lore': 0, 'Voice Acting Cast': 0}
for c in form['Did you wish for Kamisato Ayaka?']:
    counterKamisatoAyaka[c] += 1
whyWish = form['Why did you wish for Kamisato Ayaka?']
whyWish = whyWish.dropna()
for c in whyWish:
    list = c.split(', ')
    for i in list:
        counterKamisatoAyaka[i] += 1
    
design = 'Character Design'
gp = 'Gameplay'
meta = 'Meta Relevance'
lore = 'In-Game Lore'
va = 'Voice Acting Cast'
popularity = {'Character':['Hu Tao', 'Eula', 'Albedo', 'Arataki Itto', 'Shenhe', 'Xiao', 'Zhongli', 'Ganyu', 'Yae Miko', 'Raiden Shogun', 
                  'Sangonomiya Kokomi', 'Kamisato Ayato', 'Venti', 'Kamisato Ayaka'], 
             'Wished For':[counterHuTao['Yes'], counterEula['Yes'], counterAlbedo['Yes'], counterAratakiItto['Yes'],
                           counterShenhe['Yes'], counterXiao['Yes'], counterZhongli['Yes'], counterGanyu['Yes'],
                           counterYaeMiko['Yes'], counterRaidenShogun['Yes'], counterSangonomiyaKokomi['Yes'], 
                           counterKamisatoAyato['Yes'], counterVenti['Yes'],counterKamisatoAyaka['Yes']],
             'Not Wished For':[counterHuTao['No'], counterEula['No'], counterAlbedo['No'], counterAratakiItto['No'],
                           counterShenhe['No'], counterXiao['No'], counterZhongli['No'], counterGanyu['No'],
                           counterYaeMiko['No'], counterRaidenShogun['No'], counterSangonomiyaKokomi['No'], 
                           counterKamisatoAyato['No'], counterVenti['No'],counterKamisatoAyaka['No']],
             'Wished for Character Design':[counterHuTao[design], counterEula[design], counterAlbedo[design], counterAratakiItto[design],
                           counterShenhe[design], counterXiao[design], counterZhongli[design], counterGanyu[design],
                           counterYaeMiko[design], counterRaidenShogun[design], counterSangonomiyaKokomi[design], 
                           counterKamisatoAyato[design], counterVenti[design],counterKamisatoAyaka[design]],
             'Wished for Gameplay':[counterHuTao[gp], counterEula[gp], counterAlbedo[gp], counterAratakiItto[gp],
                           counterShenhe[gp], counterXiao[gp], counterZhongli[gp], counterGanyu[gp],
                           counterYaeMiko[gp], counterRaidenShogun[gp], counterSangonomiyaKokomi[gp], 
                           counterKamisatoAyato[gp], counterVenti[gp],counterKamisatoAyaka[gp]],
              'Wished for Meta Relevance':[counterHuTao[meta], counterEula[meta], counterAlbedo[meta], counterAratakiItto[meta],
                           counterShenhe[meta], counterXiao[meta], counterZhongli[meta], counterGanyu[meta],
                           counterYaeMiko[meta], counterRaidenShogun[meta], counterSangonomiyaKokomi[meta], 
                           counterKamisatoAyato[meta], counterVenti[meta],counterKamisatoAyaka[meta]],
              'Wished for In-Game Lore':[counterHuTao[lore], counterEula[lore], counterAlbedo[lore], counterAratakiItto[lore],
                           counterShenhe[lore], counterXiao[lore], counterZhongli[lore], counterGanyu[lore],
                           counterYaeMiko[lore], counterRaidenShogun[lore], counterSangonomiyaKokomi[lore], 
                           counterKamisatoAyato[lore], counterVenti[lore],counterKamisatoAyaka[lore]],
              'Wished for Voice Acting Cast':[counterHuTao[va], counterEula[va], counterAlbedo[va], counterAratakiItto[va],
                           counterShenhe[va], counterXiao[va], counterZhongli[va], counterGanyu[va],
                           counterYaeMiko[va], counterRaidenShogun[va], counterSangonomiyaKokomi[va], 
                           counterKamisatoAyato[va], counterVenti[va],counterKamisatoAyaka[va]]}

pop = pd.DataFrame(popularity)
pop
Out[13]:
Character Wished For Not Wished For Wished for Character Design Wished for Gameplay Wished for Meta Relevance Wished for In-Game Lore Wished for Voice Acting Cast
0 Hu Tao 63 39 45 39 34 16 27
1 Eula 47 55 39 30 14 14 11
2 Albedo 41 61 32 30 5 23 16
3 Arataki Itto 36 66 30 23 6 23 19
4 Shenhe 31 71 26 13 7 14 7
5 Xiao 40 62 30 32 17 24 17
6 Zhongli 73 29 53 50 46 50 34
7 Ganyu 44 58 38 30 31 19 19
8 Yae Miko 38 64 35 19 3 26 15
9 Raiden Shogun 73 29 59 54 48 45 29
10 Sangonomiya Kokomi 40 62 30 30 16 22 14
11 Kamisato Ayato 41 61 34 33 10 18 17
12 Venti 59 43 32 44 36 27 14
13 Kamisato Ayaka 64 38 48 47 39 24 26
In [14]:
#creates a more ideal dataframe which shows us the % of surveyers who wished for each character and the % of those who did as to why
popfinal = {'Character': [], 'Wished %': [], 'Design %': [], 'Gameplay %': [], 'Meta %': [], 'Lore %': [], 'Voice Acting %': []}
popf = pd.DataFrame(popfinal)

popf['Character'] = pop['Character']
popf['Wished %'] = round((pop['Wished For'] / (pop['Wished For'] + pop['Not Wished For']) * 100), 3)
popf['Design %'] = round((pop['Wished for Character Design'] / pop['Wished For'] * 100), 3)
popf['Gameplay %'] = round((pop['Wished for Gameplay'] / pop['Wished For'] * 100), 3)
popf['Meta %'] = round((pop['Wished for Meta Relevance'] / pop['Wished For'] * 100), 3)
popf['Lore %'] = round((pop['Wished for In-Game Lore'] / pop['Wished For'] * 100), 3)
popf['Voice Acting %'] = round((pop['Wished for Voice Acting Cast'] / pop['Wished For'] * 100))

popf
Out[14]:
Character Wished % Design % Gameplay % Meta % Lore % Voice Acting %
0 Hu Tao 61.765 71.429 61.905 53.968 25.397 43.0
1 Eula 46.078 82.979 63.830 29.787 29.787 23.0
2 Albedo 40.196 78.049 73.171 12.195 56.098 39.0
3 Arataki Itto 35.294 83.333 63.889 16.667 63.889 53.0
4 Shenhe 30.392 83.871 41.935 22.581 45.161 23.0
5 Xiao 39.216 75.000 80.000 42.500 60.000 42.0
6 Zhongli 71.569 72.603 68.493 63.014 68.493 47.0
7 Ganyu 43.137 86.364 68.182 70.455 43.182 43.0
8 Yae Miko 37.255 92.105 50.000 7.895 68.421 39.0
9 Raiden Shogun 71.569 80.822 73.973 65.753 61.644 40.0
10 Sangonomiya Kokomi 39.216 75.000 75.000 40.000 55.000 35.0
11 Kamisato Ayato 40.196 82.927 80.488 24.390 43.902 41.0
12 Venti 57.843 54.237 74.576 61.017 45.763 24.0
13 Kamisato Ayaka 62.745 75.000 73.438 60.938 37.500 41.0
In [15]:
#multiple bar plot creation for dataframe shown above
chars_bar = popf['Character']
wished_bar = popf['Wished %']
design_bar = popf['Design %']
gp_bar = popf['Gameplay %']
meta_bar = popf['Meta %']
lore_bar = popf['Lore %']
va_bar = popf['Voice Acting %']

x_axis = np.arange(len(chars_bar))

f, ax = plt.subplots(figsize=(25,10))

plt.bar(x_axis -0.25, wished_bar, width=0.1, label = 'Wished %')
plt.bar(x_axis -0.15, design_bar, width=0.1, label = 'Design %')
plt.bar(x_axis -0.05, gp_bar, width=0.1, label = 'Gameplay %')
plt.bar(x_axis +0.05, meta_bar, width=0.1, label = 'Meta %')
plt.bar(x_axis +0.15, lore_bar, width=0.1, label = 'Lore %')
plt.bar(x_axis +0.25, va_bar, width=0.1, label = 'Voice Acting %')

plt.xticks(x_axis,chars_bar)

ax.legend(fontsize=10)

plt.show()

Again, looking at the multiple bar graph of our character dataset above, we notice some interesting things. We'll break these down into the different categories shown as follows:

  • Wished %
    • The two most wished for characters are Raiden Shogun & Zhongli, based on our plot and dataframe data, these two have clearly the most balanced data across the board on the higher end, so this makes total sense. These are two fan favorite type of characters, so this is of no surprise whatsoever.
    • On the contrary, Shenhe was the least wished on character and when you look at her data its heavily skewed on the lower end of things. However, it can be argued that Shenhe was unlucky because her release was with Xiao & right before Zhongli and Ganyu, so that alone made her less desirable to roll for at the time.
  • Design %
    • Due to the character design column having the highest numbers, it's very appropriate to say that the biggest driving factor to make someone roll for a character is indeed the design of the character.
    • Yae Miko was the most dominant in terms of her wishes being influenced by character design. This is not surprising when taking into account our Twitter data earlier of her drip market posts likes and rewteets far exceeding all others. It's interesting that her Wished % is amongst the very lowest however, but this could very well be due to her being one of the newest characters in the game.
    • Venti is by far the least popular in our data in terms of design. We don't know why this is the case, but this is a shame since he is the anemo archon in the game which is an extremely important role.
  • Gameplay %
    • Aside from Design %, this category is the second largest driving factor in influencing character wishes based on our data.
    • The characters who get wished for gameplay the most according to our data is Kamisato Ayato and Xiao. This makes sense as Kamisato Ayato is an extremely new character with entirely new gameplay mechanics which are yet to reach their full potential. Xiao has been known as a fun guy to play with as he can jump very high and repeatedly plunge on his enemies to deal a good amount of damage. Xiao was the only higher end scoring character in design and gameplay which led in gameplay. Xiao is also the only Genshin Impact character that can currently perform aerial combos in combat. Ayato is still rolled for character design more than his gameplay.
    • Shenhe was least influential in the gameplay category as she takes on a rather underrated and niche supportive role, so it does make sense we suppose.
  • Meta %
    • The character wished more for meta than the rest was Ganyu. This is of no surprise as she is clearly one of the best if not the best characters in the game as a general consensus in the community. Simple gameplay, but extremely effective, so is definitely desired.
    • Yae Miko was by far the lowest in this category and it could possibly be due to her RNG based damage output. She's not an amazing character, but that % was way too low for any 5* character in the game to be rolled on for Meta.
  • Lore %
    • Zhongli and Yae Miko were the most popular when it came to lore in our data. Zhongli makes sense as to how he's not only an archon, but the most powerful one in lore (easily / strongest character overall in lore) and he's got so much cool stories and things about him. Yae Miko it could only really be for her story involvement in Inazuma.
    • Hu Tao, although pretty well rounded in every other category, she lacks in this department. As fun as she is, she's just a girl that works at the funeral parlor.
  • Voice Acting %
    • Arataki Itto's Voice Actor seems to have the most influence in our dataset which is a surprise due to him having extremely low favorites on MAL. We suspect it may be due to games like Persona 5 which he voice acted (English, not Japanese) in which could appeal Discord gamers.
    • Venti, Shenhe, & Eula's Voice Actors did not have much influence on our dataset. This is no surprise as on our MAL favorites dataset they are all lower in favorites.
    • Hu Tao & Ayaka's Voice Actors which are by far the most popular on our dataset definitely had a solid deal of influence on our dataset.

Basic Linear Regression Analysis¶

In [47]:
#plot correlating Sales in japan and the likes and retweets of a characters 'drip marketing' post. includes regression line
#and includes predictive point for upcoming character yelan based on her likes + rewteets
import numpy as np
import matplotlib.pyplot as plt
annotate = twitter2['Character'].values
x_data = twitter2['Post Activity'].values
y_data = sales_JP['Sales Japan (in USD)'].values
z = np.polyfit(x = x_data, y = y_data, deg=1)
f = np.poly1d(z)
x_new = np.linspace(x_data.min(), x_data.max(), 500)
y_new = f(x_new)
y_test = f(181184) #181184 - yelan likes & rewteets
plt.figure(figsize = (20,10))
plt.plot(x_data, y_data,'o',x_new,y_new)
plt.scatter(x_data,y_data, s = 100, color = "green", marker ="^")
plt.scatter(181184, y_test, s = 100, color = "blue", marker ="s")
plt.xlabel("Likes and Retweets of Banner")
plt.ylabel("Sales Data in Japan (USD)")
plt.title("Sales Data Japan (USD) vs Total of Likes and Retweets of a Character ",fontsize=40)
for i, label in enumerate(annotate):
    plt.annotate(label, (x_data[i], y_data[i]))
plt.annotate('Yelan', (181184, y_test))
print(f(181184))
plt.show()
15729667.877619218

Here is a graph on the total likes and retweets of each banner compared to the sales data in Japan(USD). This is not entirely suprising as previously mentioned we scraped English Twitter data for more drip marketing content which in turn did not correlate with sales data in Japan. Therefore, this linear regression is unrepresentative of the game's data.

In [48]:
#plot correlating Sales in china and the likes and retweets of a characters 'drip marketing' post. includes regression line
#and includes predictive point for upcoming character yelan based on her likes + rewteets
annotate = twitter2['Character'].values
x_data1 = twitter2['Post Activity'].values
y_data1 = sales_CN['Sales China iOS (in USD)'].values
z = np.polyfit(x = x_data1, y = y_data1, deg=1)
f = np.poly1d(z)
x_new = np.linspace(x_data1.min(), x_data1.max(), 100)
y_new = f(x_new)
y_test = f(181184) #181184 - yelan likes & rewteets
plt.figure(figsize = (20,10))
plt.plot(x_data1, y_data1,'o',x_new,y_new)
plt.scatter(x_data1,y_data1, s = 100, color = "pink")
plt.scatter(181184, y_test, s = 100, color = "purple")
plt.xlabel("Likes and Retweets of Banner")
plt.ylabel("Sales Data in China (USD)")
plt.title("Sales Data China (USD) vs Total of Likes and Retweets of a Character ",fontsize=40)
for i, label in enumerate(annotate):
    plt.annotate(label, (x_data1[i], y_data1[i]))
plt.annotate('Yelan', (181184, y_test))
print(f(181184))
plt.show()
20285617.770686075

This is a graph on the total likes and retweets of each banner compared to the sales data in China(USD). Once again, we scraped from English Twitter to acquire more drip marketing content, so this data does not reflext the actual sales data in China. Therefore, this linear regression is also unrepresentative of the game's data.

In [49]:
#plot correlating Sales in japan and the favorites of seiyuus on MAL. includes regression line
#and includes predictive point for upcoming character yelan based on her voice actors favorites on MAL
annotations = plot_data['Characters'].to_numpy() 
X_Plots = plot_data['#Seiyuu Favorites'].to_numpy().astype(str).astype(int)
Y_Plots =  plot_data['Japan Sales in USD'].to_numpy()
z = np.polyfit(x = X_Plots, y = Y_Plots, deg=1)
f = np.poly1d(z)
x_new = np.linspace(X_Plots.min(), X_Plots.max(), 100)
y_new = f(x_new)
y_test = f(16319)
plt.figure(figsize = (20,10))
plt.plot(X_Plots, Y_Plots,'o',x_new,y_new)
plt.scatter(X_Plots,Y_Plots, s = 40, color = "blue")
plt.scatter(16319, y_test, s = 100, color = "blue", marker ="s")
plt.xlabel("#Seiyuu Favorites")
plt.ylabel("Sales Data in Japan (USD)")
plt.title("Sales Data in Japan (USD) vs Number of Favorites for Seiyuus ",fontsize=40)
for i, label in enumerate(annotations):
    plt.annotate(label, xy = (X_Plots[i], Y_Plots[i]), xytext = (X_Plots[i] + 140 , Y_Plots[i] + 140) , ha = 'left') 
plt.annotate('Yelan', (16319, y_test))
print(f(16319))
plt.show()
15615379.470786579

This graph portrays the #seiyuu favorites on MyAnimeList commpared with sales data in Japan(USD). We can see that there is a very clear positive correlation between the two variables, so when the #Seiyuu favorites is higher, the sales data in Japan is higher as well. This is a nice representation of the game's data; therefore, we were able to predict Yelan's sales data in Japan value through the linear regression line. The line states that Yelan has 16319 #seiyuu favorites and 15615379.470786579 sales data in Japan(USD).

In [50]:
#plot correlating Sales in china and the favorites of seiyuus on MAL. includes regression line
#and includes predictive point for upcoming character yelan based on her voice actors favorites on MAL
annotations = plot_data['Characters'].to_numpy() 
X_Plots = plot_data['#Seiyuu Favorites'].to_numpy().astype(str).astype(int)
Y_Plots =  plot_data['China Sales in USD'].to_numpy()
z = np.polyfit(x = X_Plots, y = Y_Plots, deg=1)
f = np.poly1d(z)
x_new = np.linspace(X_Plots.min(), X_Plots.max(), 100)
y_new = f(x_new)
y_test = f(16319)
plt.figure(figsize = (20,10))
plt.plot(X_Plots, Y_Plots,'o',x_new,y_new)
plt.scatter(X_Plots,Y_Plots, s = 40, color = "red")
plt.scatter(16319, y_test, s = 100, color = "blue", marker ="s")
plt.xlabel("#Seiyuu Favorites")
plt.ylabel("Sales Data in China (USD)")
plt.title("Sales Data in China (USD) vs Number of Favorites for Seiyuus ",fontsize=40)
for i, label in enumerate(annotations):
    plt.annotate(label, xy = (X_Plots[i], Y_Plots[i]), xytext = (X_Plots[i] + 140 , Y_Plots[i] + 140) , ha = 'left') 
plt.annotate('Yelan', (16319, y_test))
print(f(16319))
plt.show()
20151637.914954036

This graph portrays the #seiyuu favorites on MyAnimeList commpared with sales data in China(USD). Once again, we can see that there is a very clear positive correlation between the two variables, so when the #Seiyuu favorites is higher, the sales data in China is higher as well. This is a nice representation of the game's data; therefore, we were able to predict Yelan's sales data in China value through the linear regression line. The line states that Yelan has 16319 #seiyuu favorites and 20151637.914954036 sales data in China(USD).

In [51]:
#plot correlating Sales in japan & china and the favorites of seiyuus on MAL. includes regression line
#and includes predictive point for upcoming character yelan based on her voice actors favorites on MAL
annotations = plot_data['Characters']
X_Plots = plot_data['#Seiyuu Favorites'].astype(str).astype(int)
Y_Plots =  plot_data['China and Japan Combined Sales']
z = np.polyfit(x = X_Plots, y = Y_Plots, deg=1)
f = np.poly1d(z)
x_new = np.linspace(X_Plots.min(), X_Plots.max(), 100)
y_new = f(x_new)
y_test = f(16319)
plt.figure(figsize = (20,10))
plt.plot(X_Plots, Y_Plots,'o',x_new,y_new)
plt.scatter(X_Plots,Y_Plots, s = 40, color = "green")
plt.scatter(16319, y_test, s = 100, color = "blue", marker ="s")
plt.xlabel("#Seiyuu Favorites")
plt.ylabel("Sales Data in China and Japan (USD)")
plt.title("Sales Data in China and Japan (USD) vs Number of Favorites for Seiyuus ",fontsize=40)
for i, label in enumerate(annotations):
    plt.annotate(label, xy = (X_Plots[i], Y_Plots[i]), xytext = (X_Plots[i] + 140 , Y_Plots[i] + 140) , ha = 'left') 
plt.annotate('Yelan', (16319, y_test))
print(f(16319))
plt.show()
35767017.385740615

This graph portrays the #seiyuu favorites on MyAnimeList commpared with sales data in China and Japan(USD). We can see that there is a very clear positive correlation between the two variables, so when the #Seiyuu favorites is higher, the sales data in China and Japan is higher as well. This is a nice representation of the game's data; therefore, we were able to predict Yelan's sales data in China and Japan through the linear regression line. The line states that Yelan has 16319 #seiyuu favorites and 35767017.385740615 sales data in China and Japan(USD).

In [52]:
#plot correlating rolls on our character banner data and the favorites of seiyuus on MAL. includes regression line
#and includes predictive point for upcoming character yelan based on her voice actors favorites on MAL
count1 = [227, 425, 113, 561, 413, 102, 63, 531, 431, 336, 52, 498, 49, 578]
df = pd.DataFrame(count1)
#seiyuu_df = seiyuu_df.drop([14])
seiyuu_df['#MAL Favorites'] = seiyuu_df['#MAL Favorites'].astype(int)
annotations = seiyuu_df['Character'].to_numpy()
Y_Plots = df[0].to_numpy()
X_Plots = seiyuu_df['#MAL Favorites'].to_numpy()
z = np.polyfit(x = X_Plots, y = Y_Plots, deg=1)
f = np.poly1d(z)
x_new = np.linspace(X_Plots.min(), X_Plots.max(), 100)
y_new = f(x_new)
y_test = f(16319)
plt.figure(figsize = (20,10))
plt.plot(X_Plots, Y_Plots,'o',x_new,y_new)
plt.scatter(X_Plots,Y_Plots, s = 40, color = "green")
plt.scatter(16319, y_test, s = 100, color = "blue", marker ="s")

plt.xlabel("Rolls on Each Banner")
plt.ylabel("#MAL Favorites")
plt.title("#MAL Favorites vs Rolls for the Character of Banner ",fontsize=40)
for i, label in enumerate(annotations):
    plt.annotate(label, (X_Plots[i], Y_Plots[i]))
plt.annotate('Yelan', (16319, y_test))
print(f(16319))
plt.show()
312.93219775491764

This is a graph on rolls on each banner compared to MyAnimeList Favorites. There is a slight positive correlation which matches the survey data we have above since voice acting did not play a major role in rolling for a character, but it still played a role in the game's data; therefore, we can predict Yelan's MAL favorites to be 312.93219775491764 through the regression line.

In [53]:
#plot correlating rolls on our character banner data and the twiter likes and retweets. includes regression line
#and includes predictive point for upcoming character yelan based on her likes + rewteets
count2 = [578, 498, 388, 431, 413, 561, 425]
df = pd.DataFrame(count2)
annotations = twitter2['Character'].to_numpy()
Y_Plots = df[0].to_numpy()
X_Plots = twitter2['Post Activity'].to_numpy()
z = np.polyfit(x = X_Plots, y = Y_Plots, deg=1)
f = np.poly1d(z)
x_new = np.linspace(X_Plots.min(), X_Plots.max(), 100)
y_new = f(x_new)
y_test = f(181184) #181184 - yelan likes & rewteets
plt.figure(figsize = (20,10))
plt.plot(X_Plots, Y_Plots,'o',x_new,y_new)
plt.scatter(X_Plots,Y_Plots, s = 40, color = "green")
plt.scatter(181184, y_test, s = 100, color = "blue", marker ="s")

plt.xlabel("Rolls on Each Banner")
plt.ylabel("Twitter Likes & Retweets")
plt.title("Twitter Likes & Retweets vs Rolls for the Character of Banner ",fontsize=40)
for i, label in enumerate(annotations):
    plt.annotate(label, (X_Plots[i], Y_Plots[i]))
plt.annotate('Yelan', (181184, y_test))
print(f(181184))
plt.show()
474.73909806068013

This is a graph on the rolls on each banner compared to the Twitter likes & retweets. This is an incomplete graph because we scraped from the English Twitter as stated above, so it does not really reflect on the game's data as well. We do not have enough data to support this graph.

Takeaways¶

  • Seiyuu choices for playable promotional characters impact sales data in China and Japan
    • See Saori Hayami and Rie Takahashi as nice examples
    • We can also see that Yelan is placed along the trend due to Maaya Sakamoto
  • Twitter Data
    • Older posts have less likes and retweets where newer posts have more likes and rewteets
    • There's no real correlation with English Twitter data to Chinese and/or Japanese sales
  • Survey Data
    • Character Design & Gameplay were the greatest factors as to why people wished for characters
    • Survey Data correlates fairly well with the rest of our datasets for wishes and voice acting
  • Data Analysis
    • Positive correlations for # of favorites on Mal with sales and character rolls
    • Negative correlations for Twitter Data

Future Improvements¶

  • On the survey, would've been a great idea to include if people were going to roll for Yelan and have the same 5 check marks being prompted for those who wished to do so
    • This could've been used to correlate Yelan 'want to wish for' data with our predictions for Yelan.
  • Survey could've also had another checkbox with "Rolled for No Particular Reason"
    • This is feedback we received from some of our respondants.
  • Being a part of the Twitter Developer Platform in order to use Tweepy to scrape data instead of manually.
    • This would've enabled us to scrape hashtag data and run an ML probe to extract "positive' or "negative" reactions in the comments sections.
  • We could have also brought in Chinese (CN) and English (EN) seiyuu data.
    • In this project we excluded it because the scaling of popularity is not streamlined
    • Furthermore, all these seiyuu do not have such comparable data available to scrape directly. We avoided going through the hassle of coming up with a streamlining function and manually finding each and every individual.
  • In the Japanese Seiyuu data, we could have pursued a logarithmic curve as well to predict Maaya Sakamoto's impact on Yelan's sales.
    • We couldn't perform this due to lack of data points. With our current data it would result in a very inaccurate prediction.
    • Linear given the circumstances was thus more accurate.
    • However, logarithmic appears to be the way to go as the data gets larger.
  • We collected over 4300 rolls worth of data. This is quite a feat considering getting wish history is difficult to procure. Yet, it is nowhere near enough.
    • Having people contribute to our data set for the future would help us develop more data.
    • This would enable the use of Machine Learning algorithms to make models and better predictive data
  • We could've also explored YouTube streamers' influence on people's rolls by utilizing YouTube API
    • Checking streamers' Genshin Impact activity and seeing which character gets coverage and analyzing how much was "good" or "bad" would've added a very good angle to our data.
  • Quality of Life Change:
    • Instead of keeping 10 separate wish history files we could've merged into one. It didn't affect our analysis though.

Special thanks to our friends Wallace Santos (BR) and Alex Dai (CN) for giving us high quality screen recordings to share